Efficient Summarization of URLs using CRC32 for Implementing URL Switching
نویسندگان
چکیده
We investigate methods of using CRC32 for compressing Web URL strings and sharing of URL lists between servers, caches, and URL switches. Using trace-based evaluation, we compare our new CRC32 digesting method against existing Bloom filter and incremental CRC19 methods. Our CRC32 method requires less CPU resources, generates equal or smaller size digests, achieves equal collision rates, and simplifies switching.
منابع مشابه
Feature-based Malicious URL and Attack Type Detection Using Multi-class Classification
Nowadays, malicious URLs are the common threat to the businesses, social networks, net-banking etc. Existing approaches have focused on binary detection i.e. either the URL is malicious or benign. Very few literature is found which focused on the detection of malicious URLs and their attack types. Hence, it becomes necessary to know the attack type and adopt an effective countermeasure. This pa...
متن کاملCluster-based Web Summarization
We propose a novel approach to abstractive Web summarization based on the observation that summaries for similar URLs tend to be similar in both content and structure. We leverage existing URL clusters and construct per-cluster word graphs that combine known summaries while abstracting out URL-specific attributes. The resulting topology, conditioned on URL features, allows us to cast the summar...
متن کاملPerformance Evaluation of URL Routing for Content Distribution Networks
As the World Wide Web continues to grow in size, content is being co-located throughout the world in Content Distribution Networks (CDNs). These CDNs need entirely new methods of distributing client requests. The idea of a URL router has been introduced and in this dissertation the performance of URL routing is addressed. A URL router that uses HTTP redirection to automatically forward requests...
متن کاملIn-memory URL Compression using AVL Tree
A common problem of large scale search engines and web spiders is how to handle a huge number of encountered URLs. Traditional search engines and web spiders use hard disk to store URLs without any compression. This results in slow performance and more space requirement. This paper describes a simple URL compression algorithm allowing efficient compression and decompression. The compression alg...
متن کاملWebParF: A Web partitioning framework for Parallel Crawlers
With the ever proliferating size and scale of the WWW [1], efficient ways of exploring content are of increasing importance. How can we efficiently retrieve information from it through crawling? And in this “era of tera” and multi-core processors, we ought to think of multi-threaded processes as a serving solution. So, even better how can we improve the crawling performance by using parallel cr...
متن کامل